Data-Driven Tree Transforms and Metrics

نویسندگان

  • Gal Mishne
  • Ronen Talmon
  • Israel Cohen
  • Ronald R. Coifman
  • Yuval Kluger
چکیده

We consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organization and analysis. In this paper, our goal is to organize the data by defining an appropriate representation and metric such that they respect the smoothness and structure underlying the data. We also aim to generalize the joint clustering of observations and features in the case the data does not fall into clear disjoint groups. For this purpose, we propose multiscale data-driven transforms and metrics based on trees. Their construction is implemented in an iterative refinement procedure that exploits the co-dependencies between features and observations. Beyond the organization of a single dataset, our approach enables us to transfer the organization learned from one dataset to another and to integrate several datasets together. We present an application to breast cancer gene expression analysis: learning metrics on the genes to cluster the tumor samples into cancer sub-types and validating the joint organization of both the genes and the samples. We demonstrate that using our approach to combine information from multiple gene expression cohorts, acquired by different profiling technologies, improves the clustering of tumor samples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Real-time quality monitoring in debutanizer column with regression tree and ANFIS

A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the outp...

متن کامل

Proper gromov transforms of metrics are metrics

In phylogenetic analysis, a standard problem is to approximate a given metric by an additive metric. Here it is shown that, given a metric D defined on some finite set X and a non-expansive map f : X → R, the one-parameter family of the Gromov transforms D of D relative to f and ∆ that starts with D for large values of ∆ and ends with an additive metric for ∆ = 0 consists exclusively of metrics...

متن کامل

Estimation of Tree Biomass at Individual tree, Sample plot and Hybrid Level using Drone Images

Two-dimensional image conversion algorithms to 3D data create the hope that the structural properties of trees can be extracted through these images. In this study, the accuracy of biomass estimation in tree, plot, and hybrid levels using UAVs images was investigated. In 34.8 ha of Sisangan Forest Park, using a quadcopter, 854 images from an altitude of 100 meters above ground were acquired. SF...

متن کامل

Modeling the potential of Sand and Dust Storm sources formation using time series of remote sensing data, fuzzy logic and artificial neural network (A Case study of Euphrates basin)

Due to the differences between the visible and thermal infrared images, the combination of these two types of images leads to better understanding of  the characteristics of targets and the environment. Thermal infrared images are really in distinguishing targets from the background based on the radiation differences and  land surface temperature (LST) calculation. However, their spatial resolu...

متن کامل

Intelligent identification of vehicle’s dynamics based on local model network

This paper proposes an intelligent approach for dynamic identification of the vehicles. The proposed approach is based on the data-driven identification and uses a high-performance local model network (LMN) for estimation of the vehicle’s longitudinal velocity, lateral acceleration and yaw rate. The proposed LMN requires no pre-defined standard vehicle model and uses measurement data to identif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1708.05768  شماره 

صفحات  -

تاریخ انتشار 2017